Combining SAGE tags to predict genomic transcribed regions
نویسندگان
چکیده
Analysis of several million expressed gene signatures (tags) revealed an increasing number of different sequences, largely exceeding that of annotated genes in mammalian genomes. Serial Analysis of Gene Expression (SAGE) can reveal new RNAs transcribed from previously unrecognized genomic regions. However, conventional SAGE tags are too short to identify unambiguously unique sites in large genomes. Here, we design a novel strategy with tags anchored on two different restrictions sites of cDNAs. New transcripts are then tentatively defined by the two SAGE tags in tandem and by the spanning sequence read on the genome between these tagged sites. Having developed a new algorithm to locate these tag-delimited genomic sequences, we first validated its capacity to recognize known genes and its ability to reveal new transcripts with two SAGE libraries built in parallel from a single RNA sample. Our algorithm proves fast enough to experiment this strategy at a large scale. We then collected and processed the complete sets of human SAGE tags to predict yet unknown transcripts. A cross-validation with tiling arrays data shows that 47% of these tag-delimited genomic sequences overlap transcriptional active regions. Our method provides a new and complementary approach for complex transcriptome annotation.
منابع مشابه
Transcriptome annotation using tandem SAGE tags
Analysis of several million expressed gene signatures (tags) revealed an increasing number of different sequences, largely exceeding that of annotated genes in mammalian genomes. Serial analysis of gene expression (SAGE) can reveal new Poly(A) RNAs transcribed from previously unrecognized chromosomal regions. However, conventional SAGE tags are too short to identify unambiguously unique sites i...
متن کاملPan-genome isolation of low abundance transcripts using SAGE tag.
The SAGE (serial analysis of gene expression) method is sensitive at detecting the lower abundance transcripts. More than a third of human SAGE tags identified are novel representing the low abundance unknown transcripts. Using the GLGI method (generation of longer 3' EST from SAGE tag for gene identification), we converted 1009 low-copy, human X chromosome-specific SAGE tags into 10210 3' ESTs...
متن کاملPan - genome isolation of low abundance transcripts using SAGE tag Yeong
The SAGE (serial analysis of gene expression) method is sensitive at detecting the lower abundance transcripts. More than a third of human SAGE tags identified are novel representing the low abundance unknown transcripts. Using the GLGI method (generation of longer 3’ EST from SAGE tag for gene identification), we converted 1,009 low-copy, human X chromosomespecific SAGE tags into the 3’ ESTs. ...
متن کاملGenomic convergence: identifying candidate genes for Parkinson's disease by combining serial analysis of gene expression and genetic linkage.
We present a multifactorial, multistep approach called genomic convergence that combines gene expression with genomic linkage analysis to identify and prioritize candidate susceptibility genes for Parkinson's disease (PD). To initiate this process, we used serial analysis of gene expression (SAGE) to identify genes expressed in two normal substantia nigras (SN) and adjacent midbrain tissue. Thi...
متن کاملSAGE Genie: a suite with panoramic view of gene expression.
I f the achievement of complete sequencing of the one-dimensional genetic codes of the human genome can be compared with man landing on the moon, the interpretation of genomic instruction in a four-dimensional biological context, such as during development and diseases, will prove to be a much more challenging and daunting task than that of getting man back from the moon to the earth. One of th...
متن کامل